ggml : add ggml_scale_bias #14417

ngxson · 2025-06-27T09:23:38Z

Ref discussion: #14400 (comment)

Added ggml_scale_bias(ctx, a, s, b) in this PR, which allows calculating x = a * s + b

I only added Metal kernel for now, just for discussion. @ggerganov does this looks good to you?

TODO: support other backends

ngxson · 2025-06-27T09:24:22Z

I hope this won't have a significant impact on the performance

ggerganov

Think it's a useful extension of the operator.

ggerganov · 2025-06-27T09:49:13Z

ggml/src/ggml-cpu/ops.cpp

+        ggml_vec_scale_f32(nc, (float *) ((char *) dst->data + i1*nb1), s);
+        if (b != 0.0f) {
+            ggml_vec_acc1_f32(nc, (float *) ((char *) dst->data + i1*nb1), b);
+        }


Merge these in ggml_vec_mad1_f32(). If you want, you can try to add a GGML_SIMD version using GGML_F32_VEC_FMA - it's quite simple. But also can leave it a basic for loop without SIMD.

ehoogeveen-medweb · 2025-06-28T14:08:53Z

Quick question: Is the "scale-bias" nomenclature more appropriate here than "multiply-add"? From an outsider perspective familiar with fused multiply-add ("MAD") operations, I didn't realize that "scale" meant "multiply" and "bias" meant "add" until I took a closer look.

ngxson · 2025-06-29T09:16:54Z

multiply-add can be confused because we already had ggml_mul and ggml_add which takes 2 tensors as input

ggml_scale on the other hand, takes a tensor and a scalar value as input

So ggml_scale_bias is the best fit because the naming doesn't clash with "multiply" or "add". Bias meaning we add a scalar value, not a tensor

ngxson · 2025-07-08T20:19:37Z

@ggerganov On second thought, I'm worry that extending the kernel for ggml_scale will have negative impact on performance.

I had a look into ggml_add1 and just realized that it's actually ggml_add with broadcasting under the hood (on CUDA, the same kernel is used for the 2 ops) - so I'm just wondering, should we add a new GGML_OP_ADD1 that support adding a scalar value?

My idea is that:

ggml_add1(ctx, a, b) will simply call ggml_add(ctx, a, b) under the hood
ggml_add1_scalar(ctx, a, val) is added which supports scalar value

Edit: see comment below

jeffbolznv · 2025-07-08T20:33:00Z

What is the concern with performance? Adding a value from a constant is about as cheap as it gets.

ngxson · 2025-07-08T20:55:29Z

Hmm ok maybe I'm just too concern about the fact that cheap ops if being repeatedly called can still make an impact.

A quick search in llama.cpp reveal that ggml_scale at most once or twice per layer. So I think the impact won't be as significant as I thought. Unless a model has hundred thousands of layer.

So I'll go back with the initial proposal of ggml_scale_bias for now

ggerganov · 2025-07-09T07:38:21Z

ggml/src/ggml-cpu/vec.h

+    vDSP_vsmul(y, 1, &s, y, 1, n);
+    vDSP_vsadd(y, 1, &b, y, 1, n);


~~There is vDSP_vmsa~~

There is vDSP_vsmsa

implemented in 563aca0

ggerganov · 2025-07-09T07:39:11Z

ggml/src/ggml-cpu/vec.h

+    #if defined(__ARM_FEATURE_SVE)
+        const int sve_register_length = ggml_cpu_get_sve_cnt() * 8;
+        const int ggml_f32_epr = sve_register_length / 32;//8;//svcntw(); // SVE128:4, SVE256:8, SVE512:16
+        const int ggml_f32_step = 2 * ggml_f32_epr;
+
+        GGML_F32_VEC vs = GGML_F32_VEC_SET1(s);
+        GGML_F32_VEC vb = GGML_F32_VEC_SET1(b);
+
+        const int np = (n & ~(ggml_f32_step - 1));
+        svfloat32_t ay1;
+        svfloat32_t ay2;
+        for (int i = 0; i < np; i += ggml_f32_step) {
+            ay1 = GGML_F32_VEC_LOAD(y + i);
+            ay1 = GGML_F32_VEC_FMA(ay1, vs, vb);
+            GGML_F32_VEC_STORE(y + i, ay1);
+
+            ay2 = GGML_F32_VEC_LOAD(y + i + 1*ggml_f32_epr);
+            ay2 = GGML_F32_VEC_FMA(ay2, vs, vb);
+            GGML_F32_VEC_STORE(y + i + 1*ggml_f32_epr, ay2);
+        }
+        // leftovers
+        // maximum number of leftover elements will be less that ggml_f32_epr. Apply predicated svmad on available elements only
+        if (np < n) {
+            svbool_t pg = svwhilelt_b32(np, n);
+            ay1 = svld1_f32(pg, y + np);
+            ay1 = svmul_f32_m(pg, ay1, vs);
+            ay1 = svadd_f32_m(pg, ay1, vb);
+            svst1_f32(pg, y + np, ay1);
+        }
+    #else


Remove this SVE implementation - we don't have hardware to test it yet.

done in 50c678f

ngxson · 2025-07-09T10:00:13Z

The only backend that is not currently supported is CANN, could you tag the contributors from CANN? @ggerganov

Also it would be nice if we can launch the full CI for testing CUDA and sycl, but I'm not sure how to do this (and I'm not sure if it's possible if the PR is created from a forked repo)

ggerganov

Running this through ggml-ci would be nice. You can just push a tmp branch and check its results - no need to recreate the PR.

ggerganov · 2025-07-09T10:02:08Z

ggml/src/ggml-cuda/scale.cu

    memcpy(&scale, dst->op_params, sizeof(float));
+    memcpy(&bias, (float *) dst->op_params + 1, sizeof(float));


Make the this more consistent:

memcpy(&scale, (float *) dst->op_params + 0, sizeof(float)); memcpy(&bias, (float *) dst->op_params + 1, sizeof(float));

ngxson · 2025-07-09T10:27:43Z

ggml/src/ggml-cpu/vec.h

+    #if defined(__ARM_FEATURE_SVE)
+        // scalar ; TODO: Write SVE code
+        for (int i = 0; i < n; ++i) {
+            y[i] = y[i]*s + b;
+        }
+    #else


GGML_F32_STEP doesn't seem to be defined on ARM SVE, so I leave the scalar impl here.

Yes, it's ok for now. I'm having some doubts about these SVE branches - might end up removing them all together.

ggml/src/ggml-cpu/ops.cpp

ngxson · 2025-07-09T13:04:29Z

I ran the ggml-ci and it passed: b7c6ece

Will merge this once the CI or this PR is green

* origin/master: llama : support Jamba hybrid Transformer-Mamba models (ggml-org#7531) ggml : add ggml_scale_bias (ggml-org#14417)

* ggml : add ggml_scale_bias * ggml_vec_mad1_f32 * add more simd * add CUDA * sycl * vulkan * cann (placeholder) * opencl * will this fix cpu? * fix cuda * suggestions from coderabbit * fix cann compile error * vDSP_vsmsa * rm __ARM_FEATURE_SVE * use memcpy for op params * make code looks more consistent * use scalar for __ARM_FEATURE_SVE * add x param to ggml_vec_mad1_f32

ggml : add ggml_scale_bias

50f88fc

github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Jun 27, 2025

ggerganov reviewed Jun 27, 2025

View reviewed changes

ngxson added 8 commits July 8, 2025 23:02

Merge branch 'master' into xsn/ggml_scale_bias

7af3fd9

ggml_vec_mad1_f32

a5ccf16

add more simd

e427af7

add CUDA

92a8738

sycl

a28df6f

vulkan

782b58f

cann (placeholder)

477a97a

opencl

0e51a0a

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Ascend NPU issues specific to Ascend NPUs OpenCL Issues specific to the OpenCL backend labels Jul 8, 2025

ngxson added 3 commits July 9, 2025 00:00

will this fix cpu?

4d01953

fix cuda

b22708f

suggestions from coderabbit

c8d8931

ggerganov reviewed Jul 9, 2025

View reviewed changes

ngxson added 2 commits July 9, 2025 11:52

fix cann compile error

265cb43

vDSP_vsmsa

563aca0

rm __ARM_FEATURE_SVE

50c678f

ngxson marked this pull request as ready for review July 9, 2025 09:57

ggerganov approved these changes Jul 9, 2025

View reviewed changes

ngxson added 3 commits July 9, 2025 12:05

use memcpy for op params

0d70ca8

make code looks more consistent

4ea74b0

use scalar for __ARM_FEATURE_SVE

cd1703a

ngxson commented Jul 9, 2025

View reviewed changes

slaren reviewed Jul 9, 2025

View reviewed changes

ggml/src/ggml-cpu/ops.cpp Outdated Show resolved Hide resolved

add x param to ggml_vec_mad1_f32

ebbad77

ngxson merged commit 98bab63 into ggml-org:master Jul 9, 2025
49 checks passed

ngxson mentioned this pull request Jul 9, 2025

llama : remove llm_graph_input_one #14603

Merged

		vDSP_vsmul(y, 1, &s, y, 1, n);
		vDSP_vsadd(y, 1, &b, y, 1, n);

		memcpy(&scale, dst->op_params, sizeof(float));
		memcpy(&bias, (float *) dst->op_params + 1, sizeof(float));

ggml : add ggml_scale_bias #14417

ggml : add ggml_scale_bias #14417

Uh oh!

Conversation

ngxson commented Jun 27, 2025

Uh oh!

ngxson commented Jun 27, 2025

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

ggerganov Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

ehoogeveen-medweb commented Jun 28, 2025

Uh oh!

ngxson commented Jun 29, 2025

Uh oh!

ngxson commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffbolznv commented Jul 8, 2025

Uh oh!

ngxson commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson commented Jul 9, 2025

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

ggerganov Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ngxson commented Jul 9, 2025

Uh oh!

Uh oh!

Uh oh!

ngxson commented Jul 8, 2025 •

edited

Loading

ngxson commented Jul 8, 2025 •

edited

Loading

ggerganov Jul 9, 2025 •

edited

Loading